Retranslating number expression unknown word in Chinese-Vietnamese statistical machine translation
نویسندگان
چکیده
منابع مشابه
A Novel Approach for Handling Unknown Word Problem in Chinese-Vietnamese Machine Translation
For languages where space cannot be a boundary of a word, such as Chinese and Vietnamese, word segmentation is always the task to be done first in a statistical machine translation system (SMT). The word segmentation increases the translation quality, but it causes many unknown words (UKW) in the target translation. In this paper, we will present a novel approach to translate UKW. Based on the ...
متن کاملIntegrated Chinese word segmentation in statistical machine translation
A Chinese sentence is represented as a sequence of characters, and words are not separated from each other. In statistical machine translation, the conventional approach is to segment the Chinese character sequence into words during the pre-processing. The training and translation are performed afterwards. However, this method is not optimal for two reasons: 1. The segmentations may be erroneou...
متن کاملImproved Statistical Machine Translation by Multiple Chinese Word Segmentation
Chinese word segmentation (CWS) is a necessary step in Chinese-English statistical machine translation (SMT) and its performance has an impact on the results of SMT. However, there are many settings involved in creating a CWS system such as various specifications and CWS methods. This paper investigates the effect of these settings to SMT. We tested dictionarybased and CRF-based approaches and ...
متن کاملDo We Need Chinese Word Segmentation for Statistical Machine Translation?
In Chinese texts, words are not separated by white spaces. This is problematic for many natural language processing tasks. The standard approach is to segment the Chinese character sequence into words. Here, we investigate Chinese word segmentation for statistical machine translation. We pursue two goals: the first one is the maximization of the final translation quality; the second is the mini...
متن کاملBayesian Semi-Supervised Chinese Word Segmentation for Statistical Machine Translation
Words in Chinese text are not naturally separated by delimiters, which poses a challenge to standard machine translation (MT) systems. In MT, the widely used approach is to apply a Chinese word segmenter trained from manually annotated data, using a fixed lexicon. Such word segmentation is not necessarily optimal for translation. We propose a Bayesian semi-supervised Chinese word segmentation m...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Journal of Computer Science and Cybernetics
سال: 2014
ISSN: 1813-9663,1813-9663
DOI: 10.15625/1813-9663/30/2/2589